Machine learning prediction of the failure of high-flow nasal oxygen therapy in patients with acute respiratory failure

Acute respiratory failure (ARF) is a prevalent and serious condition in intensive care unit (ICU), often associated with high mortality rates. High-flow nasal oxygen (HFNO) therapy has gained popularity for treating ARF in recent years. However, there is a limited understanding of the factors that predict HFNO failure in ARF patients. This study aimed to explore early indicators of HFNO failure in ARF patients, utilizing machine learning (ML) algorithms to more accurately pinpoint individuals at elevated risk of HFNO failure. Utilizing ML algorithms, we developed seven predictive models. Their performance was evaluated using various metrics, including the area under the receiver operating characteristic curve, calibration curve, and precision recall curve. The study enrolled 700 patients, with 490 in the training group and 210 in the validation group. The overall HFNO failure rate was 14.1% among the 700 patients. The ML algorithms demonstrated robust performance in our study. This research underscores the potential of ML techniques in creating clinically relevant models for predicting HFNO outcomes in ARF patients. These models could play a pivotal role in enhancing the risk management of HFNO, leading to more patient-centered and personalized care approaches.


Variables
Total

Model validation
For validation of prediction model, we divided the data randomly into a training set and validation set according to a 70-30 split, and then used the resampling method for the internal validation of the prediction model in training set.Finally, we performed the validation again in the validation set.We provide additional technical information on the methods and parameter settings in the Supplementary material Table 1.

Model performance and explainability
To evaluate our models, we considered three predictive metrics: area under receiver operating characteristic (AUROC) curve, Brier score and area under precision recall curve (AUPRC).AUROC is bounded between 0.5 and 1.0, with higher values being better.The Brier score is the mean squared difference between the predicted probability of HFNO failure and the actual outcome (0 or 1 where 1 indicates failure of HFNO).The Brier score is bounded between 0 and 1, with lower values being better.We additionally compared the models by plotting their receiver operator characteristic (ROC) curves, precision recall (PR) curves and calibration curve.We applied the Shapley (SHAP) value to explain features in the training set.The SHAP summary, combining feature importance with feature effects, was visualized with dot plots to present the distribution of SHAP The position on the y-axis was determined by the feature and that on the x-axis by the SHAP value.The features are ranked by importance.Moreover, partial dependence plots (PDPs) were created to visualize the average change in probability of HFNO failure for all values of a predictor while keeping all other predictors constant 17 .

Sample size and statistical analysis
Pmsampsize package (https:// search.r-proje ct.org/ CRAN/ refma ns/ pmsam psize/ html/ pmsam psize.html) in R software computes the minimum sample size required for the development of a new multivariable prediction model using the criteria proposed by Riley et al. 18 .Riley et al. lay out a series of criteria the sample size should meet.These aim to minimize the over-fitting and to ensure precise estimation of key parameters in the prediction model.Following the parameters set in the pmsampsize package, we set the c-statistic to 0.80, the potential The Kolmogorov-Smirnov test was used to test the normal distribution for measurement data.Normally distributed data were expressed as means ± standard deviation, and the skewed distributed data was reported as medians with interquartile (25th-75th) percentiles.The two groups were compared using student t-test or Mann-Whitney U tests.Numeric data were expressed as a percentage (%), using χ 2 or Fisher's exact probability tests.R software was used for all analyses (R Foundation for Statistical Computing, Vienna, Austria).

Ethics statement
The study was approved by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University (approved number: XYFY2022-KL464).The procedures were followed in accordance with the ethical standards of the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University on human experimentation and with the Helsinki Declaration of 1975.Due to the retrospective and observational nature of the study, informed consent was waived by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University.

Characteristics of participants
During the study period, 1671 patients diagnosed with ARF were initially enrolled.Following the exclusion of 971 patients for various reasons, as detailed in Supplementary Material Fig. 1, the study proceeded with an analysis of 700 patients.These patients were divided into two groups: a training set comprising 490 patients and a validation set consisting of 210 cases.The general characteristics of these groups are summarized in Table 1.There are no statistically significant differences between the training set and the validation set (all P > 0.05).In both the training and validation sets, HFNO was failure in 67 (13.7%) of the 490 patients and 32 (15.2%) of the 210 patients, respectively.Overall, the incidence of HFNO failure across the entire dataset was 14.1%.Considering the severity based on the oxygenation index, the majority of patients in this study exhibited mild or moderate symptoms 3 , as detailed in Table 1 and Supplementary Material Fig. 2.

Feature importance
Table 2 shows the top 5 most important variables of the LASSO regression model for training set.Forty-three variables from the clinical characteristics were included in the LASSO regression analysis (Fig. 1A,B).We selected five non-zero characteristic variables including logistic organ dysfunction score (LODS), Glasgow coma score (GCS), prone position, lactic acid, oxygenation index to construct models (Table 2).We plotted the SHAP importance plots to reflect the significance of the five features.Each row represents the impact of a feature on the outcome of HFNO failure, with higher SHAP values indicating higher likelihood of HFNO failure (see Fig. 2 for details).The PDPs in Supplementary material Fig. 3 shows that an oxygenation index under 155 or lactic acid above 3.5 compared to their median values increases the probability of HFNO failure.

Model performance
Figure 4 displays the AUROC, Brier score and AUPRC metrics for the different predictive models and using different sets of data.In the training set, all models resulted in AUROC on the order of 0.81 to 0.87.There were no statistically significant differences between the AUROC for all models through the DeLong's test (P > 0.05).However, only three models in the validation set had AUROC greater than 0.80.Specifically, the RF model's AUROC showed the least difference in the training and validation.To further compare the models, we additionally compared the models by plotting calibration curves and their PR curves.The STACK and RF models have lower Brier scores, and their calibration curves also have higher agreement with the 45-degree line.Similarly, the RF model's Brier score showed the least difference in the training and validation sets.The larger AUPRC represents the better performance of the model.In training set, only three models reached AUPRC above 0.5, which are LR, RF and STACK models respectively.However, in the validation set, RF has a larger AUPRC.Similarly, the RF model's AUPRC showed the least difference in the training and validation sets.In view of the above results analysis, the RF model is deemed superior to the other models.
Finally, we established a dynamic grading system to facilitate the application of the model.The website address of the dynamic scoring system is https:// huxia oyi.shiny apps.io/ whole/.

Discussion
ARF remains a leading cause of mortality among patients in ICU.Recently, HFNO has gained prominence in the treatment of ARF, effectively reducing the necessity for IMV 19,20 .However, the failure of HFNO therapy can lead to prolonged ICU stays and increased mortality rates 21 .Therefore, early prediction of HFNO failure is crucial for identifying patients at higher risk and optimizing their treatment strategies.
Since the COVID-19 outbreak in 2020, the European Society for Critical Care has released clinical practice guidelines for HFNO 22 .In 2021, several Chinese medical associations also issued expert consensus guidelines on HFNO's clinical use 23  Our data analysis reveals two crucial insights regarding the prediction of HFNO failure in ARF patients.Firstly, all models exhibit high discrimination in the training set, with some achieving an AUROC between 0.80 and 0.85.Secondly, the ability of these models to resist over-fitting, despite the inclusion of numerous features, is key to our methodology's effectiveness 24 .Traditional risk model development often follows the "one-in-ten" rule to limit features and prevent over-fitting, a constraint primarily due to the limitations of classic logistic regression.This traditional approach requires significant manual intervention and expert knowledge to exclude unnecessary features.ML algorithms can be helpful in developing more precise prognostication models that integrate complex interactions at a higher dimensional level 25 .Physicians now have access to a variety of resources to learn about ML fundamentals and techniques 26,27 .In our study's training set, the classical logistic regression model showed a higher AUROC, but it also exhibited the largest drop in the validation set, with a 0.099 AUROC difference, suggesting potential over-fitting.Our findings further confirm that ML models are generally more robust than traditional logistic regression models.However, despite their advanced algorithmic power, ML models, The relationship curve between partial likelihood deviation (binomial deviation) and log (lambda) was plotted.Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1 standard error (SE) of the minimum criteria (the 1-SE criteria).(B) LASSO coefficient profiles of the 44 features.A coefficient profile plot was produced against the log (lambda) sequence.Vertical line was drawn at the value selected using fivefold cross-validation, where optimal lambda resulted in 5 features with non-zero coefficients.LASSO least absolute shrinkage and selection operator.
except LR, are often "black-box" algorithms, offering high algorithmic capabilities but low interpretability 28 .This raises several concerns: (1) Clinicians may find it challenging to explain ML-based decisions, hindering the adoption of ML for critical decisions, and (2) Emerging regulations and concerns about ML emphasize the need for interpretability and transparent predictive reasoning.To address these issues, our study includes forest plots of multivariate logistic analysis and SHAP importance plots to better elucidate the models' characteristics.
In our study, we analyzed a total of 700 patients, among whom 99 cases (14.1%) experienced HFNO failure, as detailed in Table 1.The low failure rate of HFNO in our study can be attributed to four primary factors: (1)     The patients with respiratory failure included in our study predominantly had mild or moderate symptoms, evidenced by a median oxygenation index of 151.00.(2) The patient cohort was relatively young, with a median age of 49 years.(3) The retrospective nature of the study introduces inherent biases.(4) The use of HFNO was often complemented by the application of prone positioning.Therefore, the success rate of HFNO observed in our study surpasses that reported in previous research 29,30 .
The independent risk factors related to HFNO failure including LODS and oxygenation index were identified (Fig. 3).Among these features, oxygenation index was the strongest factor for HFNO failure in patients with ARF, and its SHAP value is the higher among several features (Fig. 2).Therefore, patients with lower oxygenation index also had a higher risk of failure in HFNO.Previous studies on the prognosis of pulmonary infection induced sepsis showed that oxygenation index was an independent risk factor for predicting in-hospital mortality 31,32 .The results of Liu et al. 33 also confirmed that oxygenation index was an independent risk factor for patients with non-invasive ventilation failure.LODS is an organ function-focused scoring system that reflects the severity of multiple organ dysfunction syndrome (MODS) 34 .This study also found that LODS was a independent risk factor for the failure of HFNO.In addition to the oxygenation index, it is also an important feature in ML models (see Fig. 3 for details).Finally, the ML model was transformed into a dynamic scoring system, which further facilitated the use of this model and patient's understanding of disease prognosis.
In this study, there were several limitations that are inherent in these types of retrospective, ML projects.First of all, this study uses retrospective data, and should continue to conduct prospective validation research.Secondly, external validation data from other institutions can further determine the extrapolation of this model.Thirdly, although the ultimately validated ML model was robust and accurate, the size of data used was still relatively small.Fourthly, variable selection was exclusively conducted using the LASSO method.We did not employ other variable selection algorithms like RF, Boruta, etc., which could potentially have further enhanced the model's performance.Fifth, patients who had multiple ICU admissions were excluded, and only those aged between 18 and 89 years were included.The reasons for this are as follows: (1) To avoid the impact of duplicate data; (2) To maintain the independence of the dataset; (3) To reduce the potential for confounding factors; (4) Children or patients older than 90 years are difficult to cooperate with high-flow nasal catheter oxygen therapy.Their poor compliance with HFNO could potentially bias the outcomes.Finally, many features associated with HFNO failure are complex and there are far more factors to be investigated and used to predict the failure of HFNO.Thus, picture features such as chest X-ray computer tomography should be included to improve the model in the future.

Conclusion
In this study, this work demonstrates the ability of ML techniques to produce clinically useful models for predicting state of HFNO.
The study may assist risk management of HFNO with improved patient centered and personalized care.

Figure 1 .
Figure 1.Demographic and clinical feature selection using the LASSO regression.(A)The selection of the tuning parameter (lambda) in the LASSO model used fivefold cross-validation with the minimum criteria.The relationship curve between partial likelihood deviation (binomial deviation) and log (lambda) was plotted.Dotted vertical lines were drawn at the optimal values by using the minimum criteria and the 1 standard error (SE) of the minimum criteria (the 1-SE criteria).(B) LASSO coefficient profiles of the 44 features.A coefficient profile plot was produced against the log (lambda) sequence.Vertical line was drawn at the value selected using fivefold cross-validation, where optimal lambda resulted in 5 features with non-zero coefficients.LASSO least absolute shrinkage and selection operator.

Figure 2 .
Figure 2. SHAP importance plots of the HFNO failure for the ML model.The position on the y-axis was determined by the feature and that on the x-axis by the SHAP value.The length of the SHAP value indicates the importance of the features.LODS Logistic Organ Dysfunction Score, GCS Glasgow Coma Score; SHAP, Shapley.

Figure 3 .
Figure 3. Forest plot of multivariate logistic regression analysis.LODS Logistic Organ Dysfunction Score, GCS Glasgow Coma Score, SHAP Shapley, OR odds ratio, CI confidence interval.

Figure 4 .
Figure 4.A series of performance metrics in the ML models.(A,B) The receiver operating characteristic (ROC) curve was compared between training and validation set.(C,D) The calibration curve in training and validation set.(E,F) The Precision Recall (PR) curve in training and validation set.LODS Logistic Organ Dysfunction Score, GCS Glasgow Coma Score, SHAP Shapley, OR odds ratio, CI confidence interval, AUROC area under receiver operating characteristic, AUPRC area under precision-recall curve.

Table 1 .
Characteristicsprediction parameter to 8, and the target event incidence to be 14.1%.Minimum sample size required for new model development based on the above parameters inputs was 459, with 65 events.The sample size in the training set satisfies the minimum sample size requirement for the development of a new multivariable prediction model.
of patients in training and testing data set.SOFA Sepsis-related Organ Failure Assessment, LODS Logistic Organ Dysfunction Score, SAPSII Simplified Acute Physiology Score, GCS Glasgow Coma Score, COPD chronic obstructive pulmonary disease, AKI acute kidney injury, WBC white blood cell, Hb hemoglobin, PLT platelet count, ALT alanine aminotransferase, AST aspartate aminotransferase, ALB albumin, Cr creatinine, BUN blood urea nitrogen, PT prothrombin time, INR international normalized ratio; *Median (Q1, Q3).Vol.:(0123456789) Scientific Reports | (2024) 14:1825 | https://doi.org/10.1038/s41598-024-52061-zwww.nature.com/scientificreports/ . These guidelines emphasize close monitoring of patients' vital signs within the first 1-2 h of HFNO application.They recommend upgrading respiratory support if failure predictors are observed, including a respiratory rate over 35 breaths/min, SpO 2 below 88%, a ROX index under 2.85, contradictory thoracic and abdominal movements, or the use of accessory respiratory muscles.Although these guidelines are based on moderate-level evidence, there remains a gap in research to enhance this evidence level.The relatively recent introduction of HFNO as a treatment limits the availability of extensive data.This study aims to contribute valuable insights for monitoring HFNO, addressing this data scarcity.

Table 2 .
LASSO regression results of important variables related to HFNOT failure (training dataset).